| Has AKI | No AKI | Total | |
|---|---|---|---|
| Test Positive | 70 (TP) | 10 (FP) | 80 |
| Test Negative | 30 (FN) | 90 (TN) | 120 |
| Total | 100 | 100 | 200 |
A Practical Review of Key Concepts
2025-08-29
A physician has been collecting data on adolescent patients’ body mass indices (BMIs). The physician is now trying to determine whether BMI is associated with glycated hemoglobin (HbA1c) levels. It is hypothesized that there should be an association because higher BMIs are associated with type 2 diabetes mellitus and because people with uncontrolled diabetes mellitus have elevated HbA1c levels. The physician has collected data on 300 patients over the past 6 months.
Of the following, the variable type that BEST describes BMI and HbA1c is:
A researcher is designing a clinical trial and developing the data collection forms. The researcher wants to collect information regarding household income, and, instead of leaving a blank area for the participant to report their income, the researcher provides the following choices: less than $50,000, $50,000 to $75,000, $75,001 to $100,000, and greater than $100,000.
Of the following, the BEST way to describe this income variable is:
A. continuous variable: This would be correct if the researcher had asked participants to write in their exact household income (e.g., “$81,100.16”).
B. dichotomous variable: This is a specific type of categorical variable with only two possible outcomes. This would be correct if the choices were simply “Income below $75,000” and “Income $75,000 or above.”
C. nominal variable: This would be correct if the categories had no logical order. For example, if the question was “What is your primary source of income?” with categories like “Wages,” “Investments,” “Retirement,” “Self-Employment.”
A hospitalist is working on a quality improvement project related to bronchiolitis. A team member has proposed an intervention package that they believe will reduce length of stay by 25%. The hospitalist is gathering length-of-stay data from the past 3 bronchiolitis seasons to set the baseline metric for the project. The histogram and descriptive statistics are shown below.
Of the following, the MOST accurate assessment of the baseline data is that:
A. the large number of outliers precludes accurate estimation: This is incorrect. While outliers complicate analysis, they don’t make it impossible. Using robust statistics like the median allows for an accurate estimation of the central tendency.
B. the mean is the most appropriate measure: This would be correct if the histogram showed a symmetric, bell-shaped (normal) distribution. In that case, the mean and median would be nearly identical and either would be appropriate.
D. there is a 95% certainty that the true mean is between 30 and 88 hours: This describes a 95% Confidence Interval. While this is a valid statistical concept, we cannot determine its value from the information given in the question, and it doesn’t address the primary issue of choosing the best measure of central tendency for this dataset.
Data obtained on elements of the complete blood count in 2- to 24-month-old infants and children with viral infections are being compared with data in those with confirmed bacterial infection.
Of the following, and assuming continuous variables and a normal distribution, the measure of central tendency that yields the BEST descriptive information for evaluating the components of the complete blood count in this study is:
B. median: This would be the best choice if the data were skewed, as we saw in the previous question.
C. mode: This is the most frequent value. It’s most useful for nominal (categorical) data. For example, to describe the most common blood type in a sample. It is rarely the best measure for continuous data.
D. range: This is a measure of dispersion or spread (Max - Min), not central tendency. It tells you how spread out the data is, but not where the center is. It would be used alongside the mean to describe the data, but it is not a measure of the center itself.
An 11-year-old boy is experiencing unexplained weight loss. At his visit this year, he has lost 3.63 kg (8.00 lb). His current height and weight are shown in the growth chart below. The chart indicates his weight is at the 5th percentile.
Of the following, assuming weights are distributed normally, this boy’s weight falls:
B. 2 to 3 standard deviations from the mean: This would be correct if his weight was extremely low, falling between the 3rd percentile (~-2 SD) and the 0.1st percentile (~-3 SD).
C. greater than 3 standard deviations from the mean: This would be correct for a truly extreme value, either very high (>99.9th percentile) or very low (<0.1st percentile).
D. less than 1 standard deviation from the mean: This would be correct if his weight was closer to average, for example, at the 25th percentile (which falls between the 16th and 50th percentiles).
A screening program conducted at a high school is screening student athletes with electrocardiography to identify undiagnosed heart disease. Of 1,000 students, 2 students are found to have a prolonged QT interval.
Of the following, the parameter of long QT syndrome that can BEST be calculated by these data is the:
A. incidence: This would be correct if the study stated: “We followed 1,000 initially healthy athletes for one year, and during that year, 2 athletes developed long QT syndrome for the first time.”
B. odds ratio: This is a measure of association, typically from a case-control study. You would need two groups (e.g., students with long QT and students without) and you would compare the odds of a certain exposure (e.g., family history) between the two groups.
D. relative risk: This is a measure of association, typically from a cohort study. You would need to follow two groups over time (e.g., one group exposed to a risk factor, one group not exposed) and compare the incidence of the disease in each group.
A recent article described the epidemiology of celiac disease in an inner-city community. The study evaluated a city with 50,000 children without celiac disease at the start of the study. At the end of the 5-year period, the following number of children were diagnosed with celiac disease:
Year 1: 5 → Year 2: 5 → Year 3: 5 → Year 4: 10 → Year 5: 10
Of the following, the annual INCIDENCE rate of celiac disease per 100,000 patients in this study is:
A. 5: This is the number of new cases in Year 1, not the average annual rate.
B. 7: This is the correct average number of new cases per year, but it is not the rate per 100,000. This is a common mistake of stopping the calculation too early.
D. 35: This is the total number of new cases over the entire 5-year period, not the annual rate.
An infant is admitted to the intensive care unit with septic shock. A recent study presented the following information: among 200 children admitted with septic shock, 100 developed acute kidney injury. Of those who developed acute kidney injury, the test was positive in 70 children. However, the test was also positive in 10 children who did not develop acute kidney injury.
Of the following, based on these results, the SENSITIVITY of the new blood test for detecting acute kidney injury is:
| Has AKI | No AKI | Total | |
|---|---|---|---|
| Test Positive | 70 (TP) | 10 (FP) | 80 |
| Test Negative | 30 (FN) | 90 (TN) | 120 |
| Total | 100 | 100 | 200 |
A. 10%: This is the False Positive Rate (10 False Positives / 100 Healthy Patients).
B. 30%: This is the False Negative Rate (30 False Negatives / 100 Patients with AKI).
D. 90%: This is the Specificity of the test. Specificity answers: “Of all the people who are healthy, what percentage test negative?”
A research article describes a potential new serum screening test for eosinophilic esophagitis (EoE). One hundred children are recruited. Of the 25 with EoE, the screening test was positive in 20 and negative in 5. Of the 75 without EoE, the screening test was negative in 65 and positive in 10.
Of the following, the sensitivity and specificity of the screening test are:
| Has EoE (Disease) | No EoE (Healthy) | Total | |
|---|---|---|---|
| Test Positive | 20 (TP) | 10 (FP) | 30 |
| Test Negative | 5 (FN) | 65 (TN) | 70 |
| Total | 25 | 75 | 100 |
| Has EoE (Disease) | No EoE (Healthy) | Total | |
|---|---|---|---|
| Test Positive | 20 (TP) | 10 (FP) | 30 |
| Test Negative | 5 (FN) | 65 (TN) | 70 |
| Total | 25 | 75 | 100 |
Calculations:
The health editor of a parenting magazine inquires about a study that was recently published. The study examined the relationship between toddler thumb-sucking and later food allergy. The scatterplot below shows the study results.
Of the following, based on the provided scatterplot, the MOST appropriate conclusion to draw from this study is that thumb-sucking:
B. is a result of food allergy: This reverses the potential causal pathway with no evidence. It’s an unsupported causal claim.
C. is unrelated to the incidence of food allergy: This is factually incorrect. The scatterplot clearly shows a relationship (a negative correlation). If it were unrelated, the data points would be scattered randomly with no discernible trend.
D. lowers the incidence of food allergy: This is a causal claim. The word “lowers” implies that thumb-sucking is the active agent causing the effect. While this might be a hypothesis for a future study (like a randomized controlled trial), this observational data cannot support this strong conclusion.
Variable Types Matter: Choose the right statistical approach based on whether your data is continuous, discrete, nominal, or ordinal
Central Tendency: Use mean for normal distributions, median for skewed distributions
Standard Deviations: In normal distributions, ±1 SD = 68%, ±2 SD = 95%, ±3 SD = 99.7%
Prevalence vs. Incidence: Prevalence is a snapshot, incidence measures new cases over time
Diagnostic Tests: Sensitivity = true positive rate, Specificity = true negative rate
Association ≠ Causation: Observational studies can show correlation but cannot prove causation
Austin Meyer, MD, PhD, MS, MPH, MS
Remember: The goal is not just to know statistics, but to apply them wisely in clinical practice.